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In this paper we study goodness-of-fit testing of single- index mod- 
els. The large sample behavior of certain score-type test statistics is 
investigated. As a by-product, we obtain asymptotically distribution- 
free maximin tests for a large class of local alternatives. Furthermore, 
characteristic function based goodness-of-fit tests are proposed which 
are omnibus and able to detect peak alternatives. Simulation results 
indicate that the approximation through the limit distribution is ac- 
ceptable already for moderate sample sizes. Applications to two real 
data sets are illustrated. 

1. Introduction. Suppose that a response variable Y depends on a vector 
X = (xi, . . . , Xp)^ of covariates, where T denotes transposition. We may then 
decompose Y into a function m{X) of X and a noise variable e, which is 
orthogonal to X, that is, for the conditional expectation of e given X we 
have E(e|X) = 0. When Y is unknown, the optimal predictor of Y given 
X = X equals m(x). Since in practice the regression function m is unknown, 
statistical inference about m is an important issue. In a purely parametric 
framework, m is completely specified up to a parameter. For example, in 
linear regression m(x) = /3^x, where f3 is an unknown p- vector which needs 
to be estimated from the available data. Slightly more generally we may 
consider m(x) = $(/?^x), where the link-function $ may be nonlinear but 
is again specified. This is the so-called generalized linear model. 

When $ remains unspecified, we arrive at a semiparametric model which 
is more flexible on the one hand and, on the other hand, avoids the curse 
of dimensionality one faces in fully nonparametric models. The estimator 
of /3, as well as of the link function in this so-called single- index model 
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was studied by among others, Li and Duan [25], Hardle, Hall and Ichimura 
[16], Ichimura [23] and Hristache, Juditsky and Spokoiny [22]. Related work 
is [6] and [20]. Clearly, any statistical analysis within the model, to avoid 
wrong conclusions, should be accompanied by a check of whether the model 
is valid at all. For the single-index model the diagnostic methods are less 
elaborate. We only mention Fan and Li [14], Ait-Sahalia, Bickel and Stoker 

[I] and Xia, Li, Tong and Zhang [38] here but come back to them later. 
See Discussion 2.6, when we are prepared to compare their approaches and 
results with ours. The paper by Hardle, Mammen and Proenga [19] considers 
a parametric link structure and therefore does not fall into the area studied 
in this paper. 

In the present paper, we aim at developing some formal tests for model 
checking when the link function remains unspecified. 

For more specified regression models the literature is much more elab- 
orate. To review only a few contributions. Cox, Koh, Wahba and Yandell 
[8] introduced tests of the null hypothesis that a regression function has a 
particular parametric structure. Azzalini, Bowman and Hardle [3] consid- 
ered nonparametric regression as an aid to model checking. Cox and Koh 
[7] developed spline-based tests of model adequacy. Eubank and Spiegelman 

[II] considered spline approaches to testing the goodness of fit of a linear 
model. Simonoff and Tsai [28] proposed diagnostic methods for assessing 
the influence of individual data values on goodness-of-fit tests based on non- 
parametric regression. Gu [15] used spline methods in a diagnostic approach 
to model fitting. Azzalini and Bowman [2] used nonparametric regression 
to check linear relationships. Eubank and LaRiccia [10] derived properties 
of two-sided tests in nonparametric regression based on Fourier methods. 
Hardle and Mammen [17] considered comparisons between parametric and 
nonparametric fits and used the wild bootstrap for the computation of crit- 
ical regions. Hardle, Mammen and Miiller [18] investigated testing for para- 
metric versus semiparametric modeling in generalized linear models, again 
using the wild bootstrap. 

Note, however, that any test using a nonparametric regression estimator 
runs into an ill-posed problem requiring the choice of a smoothing param- 
eter. Therefore, an alternative approach was developed which circumvents 
these problems. To name only a few papers, Bierens [4] proposed to check a 
parametric regression model by investigating the sum of properly weighted 
residuals. See also [5] for an informative discussion of the resulting tests 
when local alternatives are considered. In Stute [33] a method was studied 
which is based on the integrated regression function and which corresponds 
to cumulative quantities such as empirical distribution functions or ranks 
known from other areas in statistics. In this setup the author was able to 
derive a principle components decomposition of the underlying test process, 
which is extremely useful for design of optimal tests versus local alternatives 
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and for understanding the impact of the design distribution and the noise 
variance on the power of the tests. In particular, optimal Neyman-Pearson 
tests which are based on linear rather than quadratic test statistics can be 
obtained from this decomposition. Stute, Gonzalez Manteiga and Presedo 
Quindimil [35] studied the quality of the distributional approximation of an 
associated cusum process via the wild bootstrap, while Stute, Thies and Zhu 
[36] proposed an innovation process approach so as to obtain asymptotically 
distribution-free and optimal tests. Finally, Stute and Zhu [37] developed 
nonparametric testing for the validity of a generalized linear model, which 
is based on a proper transformation of a residual empirical process and 
which perfectly adapts to a situation when the design vector is elliptically 
contoured. 

In the framework of the single-index model the link function is unknown 
and, as part of the testing procedure, needs to be estimated in a nonparamet- 
ric way. From our preceding remarks on ill-posedness, one might conclude 
that nonparametric estimation of the link function necessarily excludes the 
possibility of constructing tests which have optimal power versus local alter- 
natives converging to the null model at the rate n~^/^. Fortunately, as this 
paper will show, this pessimistic view is not justified. To obtain such tests, 
rather than comparing the estimator of $ with the hypothetical semipara- 
metric model, we embed the residuals into a cusum process. This summation 
has a smoothing effect so that our test is much less sensitive than usual to 
a wrong choice of the bandwidth. At the same time, each residual is prop- 
erly weighted by a function of the design vector. Our main result. Theorem 
2.1, is formulated for a given fixed weight function. Such an approach has 
a long tradition in statistics. Typically, score tests are first analyzed (and 
optimized) when the direction from which the alternative tends to the null 
model is specified. Classical examples are linear one- and two-sample rank 
statistics or rank correlation statistics. Also, robust tests focussing on a 
neighborhood of a given family of distributions are designed in this spirit. 

Theorem 2.1 not only provides the asymptotic normality of a large class 
of score statistics, but also yields (up to a remainder) a representation as 
a sum of i.i.d. variables. From this, when the alternative is specified, we 
shall be able to choose the weights so as to optimize local power. This 
discussion will give us a clue as to how to proceed if the alternative model has 
arbitrary but finite codimension d. In such a situation we propose and study 
a test which is asymptotically distribution-free and shown to be maximin 
(Corollary 2.2). Since d is arbitrary. Corollary 2.2 covers most situations 
arising in practice. The i.i.d. representation is also useful for implementation 
of a proper bootstrap approximation. See Section 3 for some details. 

For those readers who prefer omnibus tests, we also discuss (Theorem 
2.3) a situation where the deviation from the null model is completely non- 
parametric. Also, in this case, the local asymptotic power can be derived. 
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Finally, we include a discussion of how our test behaves when local peak 
alternatives are to be detected. 

The paper is organized as follows. In Section 2 we introduce the basic 
test statistics and formulate our main results. In Section 3 we report on 
some simulation results and apply our method to two data sets. Proofs of 
theoretical results are postponed to Section 4. Readers who want to skip the 
technical part may consult Section 2 for an informal discussion and some 
background information on proofs. 

2. Main theorems. Throughout the paper we assume that the available 
data {Xi,Yi), 1 <i<n, are independent and have the same distribution as 
(X,Y). Under the null hypothesis, that is, under the single-index model, 

(2.1) y = $(/3^X) + e, 

where (3 is an unknown vector and ^ is an unspecified link function defined 
on the real line. The noise variable e satisfies 

(2.2) E{e\X)=E{e\P^X) = 0, 
which is tantamount to saying that 

(2.3) E{Y\X)=E{Y\(3'^X) = ^{I3^X). 

Note that (2.2) allows e to depend on X so that (2.1) may include het- 
eroscedastic errors. The first equation in (2.3) features the projection pur- 
suit character of the single-index model in that the conditional mean of Y 
given X only depends on a proper projection of X. 

To motivate our approach, assume for a moment that we already have 
an estimator (3 of /3. Replacing P'^Xi with (3'^Xi, we could try to estimate 
$ through a Nadaraya-Watson estimator $ or a local linear smoother as 
discussed, for example, in [13]. The disadvantage of these smoothers, at 
least in our context, comes from the fact that the distribution of /?, as well 
as X, will likely have an effect on the distribution of our test statistic, 
even in the limit. This phenomenon is well known in many other statistical 
problems, when unknown parameters need to be estimated. Typically, the 
effect on the distributional character requires some correction through a 
proper transformation of the test statistic. See, for example, [34]. Moreover, 
the ratio structure of these estimators $ creates some technical problems 
when the denominator is small, that is, when x lies in a region of low density. 
From time to time some structural assumptions on level sets are imposed, 
but when it comes down to estimation, these assumptions can hardly be 
justified for To avoid all these nasty side effects, we decided to use an 
estimator of $ which employs a transformation of Xi to a variable which 
is approximately uniform on the unit interval (0,1). In other words, we 
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incorporate a transformation which makes everything distribution-free, as 
far as the distribution of 0^ X is concerned. This estimator is a symmetrized 
nearest-neighbor (NN) estimator. Its consistency was proved by Yang [39], 
while Stute [32] provided the asymptotic normahty. In these papers, the 
regression function itself was, of course, the target and the distribution- 
freeness only applies to the random deviation but not to the bias term. In the 
context of the present paper, $ only appears as a tool to define the residuals. 
When we consider a properly weighted sum of the residuals, averaging yields 
a smaller variance to the effect that we may choose smoothing parameters 
so that at the same time the bias becomes negligible and the variance part 
remains as the only nonnegligible source of error. This more or less enables 
us to construct tests which have nontrivial power when the alternatives 
approach the null model at the rate n~^/^. 

To motivate our approach on a more technical level, assume that 0^X 
has a continuous distribution function F^^ , that is, 

F{x)=F'^{x):=¥{[fX<x), xGM. 

Here P denotes a probability measure defined on a space (O,^) carrying all 
random variables which may appear. Denote by F~^ the quantile function 
of F: 

F"^(n) =inf{xGM:F(3;) < u < 1. 

Put U := F[P'^X). By continuity of F, the variable U has a uniform distri- 
bution on (0,1). Setting 

equation (2.1) becomes (with probability one) 

Y = i;iU)+e. 
In terms of regression, this may be expressed as 

m(x) = E{Y\X = x) = $(/?'^x) = ip{u), 

where 

u = F{[f:x) and V(n) = '&{Y\F{(fX) = u). 
Therefore, the kernel estimator for ^ at < u < 1 becomes 

1 " 

i^n{u) = -y^YiKh{u-Ui), 
1=1 

where 



Kh{v) = 
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and is a symmetric kernel on the real line integrating to one, while h = 
/i„ > is a bandwidth. The random variables 

Ui=F^{P'^X,) 

are i.i.d. from the uniform distribution on (0,1). Since and p are un- 
known, ipn cannot be our final estimator. For this, replace /? by some es- 
timator P and F = by the empirical distribution function F„ of Xi , 
1 < i < n. This yields 

Ur.= Fn0'^X,), l<i<n, 
with corresponding estimator 

1 " 

M^^) = -Y.YiKh{u-U^). 

1=1 

This estimator is related to that in [32], up to the fact that there univari- 
ate Xj's were considered and no preliminary projection was required. The 
?7j's are the normalized ranks pertaining to the projected values i?' Xi. Since 
these values depend on the random /?, existing results on rank statistics can- 
not give us easy access to the analysis of our final test statistic, in particular, 
since the ?7i's appear as part of the smoothed function at u. 

Worse than that, we have to evaluate -i/^n at each IJy This finally leads to 
the residuals 

ij = Yj -ipniUj), l<j<n. 

(i) 

Actually, to reduce a possible bias, we shall consider estimators ipn com- 
puted in the same way as ipn, but with the jth datum deleted from the 
observations. Hence, the residuals are to be redefined as 

^j=yj-i^L'\Uj), l<j<n. 

The mathematical analysis of ipn\Uj) and, hence, of ij requires careful 
study of the local properties of F^ evaluated at P'^Xi. The oscillation be- 
havior for the ordinary empirical process has been investigated in detail 
in [30, 32]. In the present situation we need to study the fluctuations of 
empirical measures over halfspaces rather than quadrants. 
Our final test statistic will be of the form 

n 

i=i 

The weights Wj will be of the form Wj = W{Xj). The function is a 
smooth function defined on M*'. A discussion of how to choose 14^ in a testing 
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situation is postponed to the end of this section. Under the nuU model (2.2), 
we may expect that T„ behaves similarly to 

n 

i=i 

Since Wj is orthogonal to Ej, Tn is centered. Hence, we may expect that 
also Tn fluctuates around zero under (2.2). Under (local) alternatives, the 
ij also comprise quantities which hopefully are not orthogonal to the Wj's. 
If we choose in a proper way, this fact will guarantee nontrivial power of 
the test. 

More specifically, we shall first consider models of the type 

(2.4) Yin = HP'^X^) + n-^/\s{X^) +e^, l<i<n, 
where the (Xj,ej) are i.i.d. satisfying 

(2.5) H^ilXi) = for 1 < i < n. 

The function ^, as well as the parameter /?, remain unspecified, as will be 
the distribution of Xi and e^. The function s may or may not be specified. 
When s = 0, the single-index model holds. For specified alternatives, we shall 
later discuss how to choose W in order to maximize local power. 

So far we have not discussed how to estimate (3. We shall come back to this 
point in Section 3 when we apply our method in a simulation study and to 
real data. In fact, the discussion of (3 may be delayed since our assumptions 
on P are very general and do not assume any particular form for /?. 

We now state the assumptions needed for Theorem 2.1 below. For this, 
put, for < li < 1, 

W{u)=E[W{X)\U = u\, s{u)=E[s{X)\U = u]. 

Theorem 2.1. Assume that (2.4), (2.5) and the following conditions 
hold: 

A (i) ?/^, s and W are twice continuously differentiable. 

(ii) YW{X) and eW{X) have finite second moments. 
B (i) E||X||''' < oo /or some 7 > 2. 

(ii) For all 6 in a neighborhood of (3, the variables 6^ X have contin- 
uous densities f^ which are uniformly bounded. 

(iii) The distribution functions of 9'^X are continuous in 9 at 9 = 

(iv) The estimator (3 satisfies n^/'^{l3 — (3) = Op(l). 
C (i) nV2/i2 ^ and h-^n-^l'^+^l^ 0. 

(ii) K is a symmetric kernel with compact support, twice continuously 
differentiable with J K = 1. Furthermore, K is nonincreasing on the pos- 
itive real numbers. 
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Then we have 

n 

(2.6) = /i + ^ _ ^([/.)] + 

and, therefore, by the CLT, 

Tn — *■ (T^) in distribution, 

where 

a^ = E{e^[WiX)-W{U)]^} 

and 

fi = K{[siX) - Eis{X)\U)]WiX)}. 

A discussion of A-C will be postponed until the end of this section. 

The drift comprises the deviation of s{X) from the space of variables 
spanned by 0^ X. Under the single-index model, the bracket equals zero 
and so does ^. Also, W{X) should not depend on X through P'^X, since 
then also = 0. The variance does not depend on s but, among other things, 
measures the deviations between W{Xj) and the projected values W{Uj). 
The limit variance <t^ also does not depend on the unknown $. A consistent 
estimator of is obtained by 

<7l = ^j:i][W{X,)-W(^\u,)f, 

Tl . 

where Wn"^ is defined similarly to V^^- Just replace Yi with W{Xi) in the 
definition of the NN-estimator. Putting 

• — Tn/Cn, 

we then obtain 

Tn — > M{C, 1) in distribution, 

with 

C = ^i/a. 

The null model is rejected at level a if 

\Tn\ > Ai_q/2 = A, 

where A is the (1 — ^)-quantile of the standard normal distribution function 
$. Hence, the asymptotic power of |r„| against the local alternatives (2.4) 
equals 1 — [^{C + A) — ^{C — A)]. This is a monotone function of \C\. Thus, 
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we should select the weight function in a way that makes as large as 
possible. If we write, in an obvious notation, 

it is easy to determine the optimal solution of our problem when the e's are 
independent of X, that is, if the homoscedastic case holds. Then the above 
ratio equals 

¥.e^¥.[W{X)-W{U)Y' 

and the Cauchy-Schwarz inequality immediately yields that the optimal 
weight function Wq equals, up to a constant factor, the function s: 

(2.7) t^o(x)=s(x). 

Next we study an important extension of (2.4). For this, let si,. . . ,Sd be any 
finite number of functions, where d > 1. In applications, these functions may 
constitute a possible (mean) dependence of y on X = x other than projec- 
tions of X. For example, some of the s-functions may be quadratic forms, 
and others may be in charge of possible interactions between coordinates of 
X. 

Instead of (2.4), we therefore consider the more complex model 

d 

(2.8) Yin = ^{fX.i)+n-^/^Y.^jSj{X,)+e,, 1 < i < n, 

i=i 

where /3 G M^, 71, . . . , 7^ G M are unknown parameters and $ is a nonspecified 
link function. The null model thus corresponds to 

^^0 : 71 = • • • = 7rf = 0. 

In the following we shall derive maximin tests for Hq versus ||7|| > c, where 
II • II is a proper norm and 7-^ = (71, . . . ,7^). Needless to say, such test prob- 
lems have been well studied in the context of linear regression. The present 
situation is much more complex since now the null model is the semipara- 
metric single-index model. To the best of our knowledge, the following setup 
provides the first maximin-test in semiparametric regression. For this, and 
in view of (2.7), we consider the score-statistics pertaining to W = sj, 
j = l,...,d. Put 

rp (T'^ rrid\T 

— yj^ni ■ ■ ■ ^^n) • 

Theorem 2.1 implies that, under (2.8) (in the homoscedastic case), we have 
in distribution, as n — > cx), 

(2.9) f„^S 1^ : j +AAd(0,/S). 
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Here, S = i(^ij)i<i,j<d with 

a,,=E{[s,iX)-Eis,iX)\U)][sj{X)-E{sj{X)\U)]}, 

Afd denotes a normal distribution on M'^ and p"^ = Ee^. Assertion (2.9) ex- 
hibits that, in the limit, T„ is a standard Gaussian shift model. Distribu- 
tional characteristics of the model (2.8) only appear through the (estimable) 
covariance matrix. This observation once again supports our approach, in 
particular, the use of the NN-smoother and the rank transformation. 

We may now use existing maximin-theory to obtain optimal tests for 
Hq. See, for example, [29], Theorem 30.2. For this define J2n — {'^ijn)i<i,j<d 
through 

^ k=i 

Corollary 2.2. For a given significance level < a < 1, the test 

is a maximin a-test for Hq versus Hi : 7"^S7 > p^a. Here Cq is the (1 — a)-quan- 
tile of the chi-square random variable Xd with d degrees of freedom. The 
asymptotic maximin power is given by P(x^(a) > Ca), where now a is the 
noncentrality parameter. 

Since the codimension d is arbitrary. Corollary 2.2 covers many examples 
of interest. Some, for example, interaction alternatives, are studied in Section 
3. For those who prefer omnibus tests, we now discuss a class of tests which 
has reasonable power over a nonparametric class of alternatives. 

Hence, we come back to (2.4) but leave s unspecified. In order to achieve 
power, we need to consider a family of weight functions {VF7}7 guaranteeing 
that at least one is able to detect a possible deviation of s{X) — s(U) from 
zero. A class of (smooth) score functions which has found a lot of interest 
in classical empirical process theory is the family of trigonometric functions. 
This led to an intensive study of the empirical characteristic function. See, 
for example, [12] for a nice review and further applications. In our context, 
therefore becomes 

(2.10) W{-f,x)=exp[i-f'^x], 

where i is the complex unit and 7 G M^. If we take only finitely many 7's, we 
may conceive, as in Corollary 2.2, asymptotically distribution free x^-tests. 
To handle a nonparametric alternative, we have to let 7 vary over W. Hence, 
we come up with a stochastic process 

f„,(7) :=n"V2^£^.VF,(7), 
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where Wj^'j) = W{'y,Xj). Note that T„ has continuous sample paths in 7. 
The convergence of the finite-dimensional distributions again follows from 
(2.6). Tightness is not difficult as long as 7 varies in a compact set, since 
the W{^,x.) are smooth functions in 7 and x. For detailed arguments, one 
needs to check the proof of Theorem 2.1 and show that the remainders are 
uniformly small on compact 7-sets, while the leading terms are uniformly 
continuous. After all this we then come up with the following result. 

Theorem 2.3. Under the assumptions of Theorem 2.1, the stochastic 
processes {1^(7) : 7 G IR^} converge in distribution (on compact sets) to a 
continuous Gaussian stochastic process Too such that 

(2.11) fii^)^Ef^i^)=E{[siX)-siU)]Wij,X)} 

and 

Cov(foo(7i), ^00(72)) = E{e'^[W{-fi,X) - W{-fi, U)] [W{^2,X) - W{j2, U)]}. 
A Kolmogorov-Smirnov (KS) type test rejects Hq if 

r„ = SUp|f„(7)| > Ca, 

7 

where Cq is the (1 — a)-quantile of sup^|roo(7)| under Hq, that is, s = 0. 
Since this test is no longer distribution-free, a bootstrap approximation is 
recommended. See Section 3 for further details. For power considerations, 
we expand fi{'~f) at (3 yielding 

= E{[siX) - s{U)]W{P,X) exp[i(7 - pfx]} 

r^E{[s{X)-s{U)]W{P,X)} 

+ i(ri - ()fE{{s{X) - s{U)W{(3, X)X]. 

The first integral vanishes, since s{X) — s{U) is orthogonal to the space 
of random variables measurable w.r.t. 0^ X. The second (vector-valued) in- 
tegral / = /(s), say, usually does not vanish so that, for example, 

sup |/i(7)| ~ sup II7 — I3\\ \\I\\ > 0. 

7 7 

This property guarantees that the KS-test has asymptotic power > a uni- 
formly for all s for which ||/(s)|| is bounded away from zero. 

Needless to say, a version of Theorem 2.3 also holds for other parametric 
families of functions W{'j,-). We focussed on trigonometric functions since 
they are at the same time smooth and measure determining and allow for a 
simple expansion of the drift function. 

Though our results cover a large class of local alternatives, people some- 
times are interested in detecting so-called "peak alternatives." For this, one 
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needs to consider shift functions s which depend on n in such a way that, as 
n ^ oo, Sji (weakly) converges to a Dirac function or a hnear combination 
of such functions. A typical candidate is 



where a„ — > but na^ — > oo. The "density" tp, as well as xq, the center 
of the peak, remain unspecified. The test process r„(-) may also serve as 
a basis to detect alternatives (2.8), where some of the Sj's are of "global 
type," that is, do not depend on n. Others may be of type (2.12). Since the 
covariance is not affected by the shift, the limit covariance remains the same 
as in Theorem 2.3. Relevant proofs only deal with the null model so that 
no changes are required. The shift only enters into Lemmas 4.4 and 4.5, 
resulting in Corollary 4.6. Taking into account the local flavor of (2.12), 
these lemmas need some minor modifications resulting, under s = from 
(2.12), in the drift function 



where / is the density of X. Here uq = -F(/3^xo). Details are omitted. The 
function (2.13) nicely features the components which determine the power 
of the test when s equals (2.12): 

• The X-density at xq : /(xq). 

• The "height" of the peak at xq : 97(0). 

• The deviation of s from the null model at xq : s(xo) — s{uq). 

If we let 7 vary over a large compact set, the Kolmogorov-Smirnov test 
associated with T„ is able to detect peak alternatives which converge to the 
null model at the rate n~^^'^. The asymptotic power exceeds a but is less than 
one, depending on the three components discussed above. In particular, our 
approach yields the correct asymptotics. This finding should be compared 
with other approaches, where, for much simpler purely parametric regression 
models, alternatives had to converge to the null model at a rate lower than 
n-V2. See, for example, [21] and references therein. Not unexpectedly, the 
power then converges to one. 

We continue with some comments on A-C. 

Remark 2.4. Condition A comprises standard smoothness and moment 
assumptions on the involved functions. Condition B requires some weak con- 
ditions on the design vector and on /3. In C, ^/nh'^ will be needed to 
make the bias tend to zero. The second assumption on h will be needed to 
control the fluctuations of the random sums. In view of the fact that we 
always deal with standardized sums and also that large Xj's may enter the 
statistics, some connection with the tails of X (in terms of 7) are natural. 



(2.12) 




(2.13) 



/i(7) =EToo(7) = [s(xo) -s(no)]VF(7,xo)¥p(0)/(xo) 
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The conditions on K are also standard. The monotonicity of K guarantees 
that K' has identical signs on the positive and negative reals. Moreover, 
i^'(O) = 0. In other words, K may be decomposed into two parts, each of 
which is compactly supported, by the positive and negative real lines, re- 
spectively, and having identical signs there. This property is useful in proofs 
when, after Taylor's expansion, K' appears as a smoothing kernel. 

Remark 2.5. The conditions on h are weak and are satisfied for a large 
class of bandwidths. A referee pointed out that this fact could be interpreted 
as a kind of robustness of the method w.r.t. the choice of h. In particular, 
they do not depend, as in related work, on the dimension p of the X-vector 
or higher degrees of smoothness of the involved functions. We may choose h 
so that 'n}/'^h? and h^'^n"'^^'^^^^'^ are of the same order. This yields 

In the next section we propose two adaptive methods of bandwidth choice 
which worked very well in our simulation study. If we are not only interested 
in maximizing power for a given alternative, we may choose a W with com- 
pact support. In this way the test is robust against outliers among the Xj's. 
Our proof then works with 7 = 00, that is, ^ = 0. In this case, h ~ n~^/^. 

Discussion 2.6. It is time to compare our approach and results with 
those of Fan and Li [14], Ait-Sahalia, Bickel and Stoker [1] and Xia, Li, Tong 
and Zhang [38] . The tests of the first two papers are based on a (weighted) 
residual sum of squares and are in the spirit of Hardle and Mammen [17]. 
The asymptotic normality of the test statistic is achieved by a clever ap- 
plication of central limit theorems for sequences of degenerate [/-statistics. 
More precisely, Fan and Li [14] (FL) based their test on a quadratic form of 
the estimated residuals. Since no rank transformation is involved, they had 
to weight each residual with estimators of marginal and high-dimensional 
densities, to get rid of the denominator in the Nadaraya-Watson estima- 
tor. Consequently, two different smoothing parameters need to be involved. 
It is heuristically argued that local alternatives only can be detected when 
they approach the null model at the rate 0((n/i^/^)~^/^), which gets worse 
as the dimension of X increases. The estimator of /3, being square-root 
consistent, does not have any impact on the limit distribution because the 
other quantities converge at a slower rate, thus compensating for the ef- 
fect of estimating unknown parameters. In a general situation of testing a 
model or hypothesis, efficient methods involve test statistics and estima- 
tors which admit expansions of the same order. See, for example, [9], to 
name only one landmark paper on this topic. Unless some orthogonality 
assumptions are satisfied, the parameter estimator does have an impact on 
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the limit, and martingale transformations, as in [36], were designed to keep 
track of this issue. See also [34]. Efficient model checks would therefore cre- 
ate terms which when replacing /? with /3 are not negligible and thus have 
an impact on the distributional behavior of the test statistic. As to prac- 
tical applications, computation of critical values would then not be easy. 
Worse than that, the complicated geometric structure of the test statis- 
tic would not enable us to derive optimal scores. Actually, these are only 
two of several reasons why we designed our test as we did. There are oth- 
ers. As a by-product, the assumptions on the design variable X remain 
weak. No additional support or higher smoothness conditions need to be 
assumed. The variable Y may be discrete and no joint density of X and 
Y is required. Compared with Fan and Li [14], Ait-Sahalia, Bickel and 
Stoker [1] is mainly concerned with the problem of dimension reduction 
for high-dimensional inputs. Only some comments on the applicability to 
single-index models are included. Their test statistic is a sum of weighted 
residual squares, the weights now being deterministic functions of the re- 
gressors. In their Proposition 2 the local power of the test is derived when 
the alternatives tend to the null model at a rate depending on p. It should 
also be mentioned that the test statistic admits a bias increasing to infinity 
as n — > oo. Moreover, the constants defining the asymptotic bias are un- 
known and require further smoothing when being estimated. Similarly, in 
Xia, Li, Tong and Zhang [38], who extended the marked empirical process 
approach of Stute, Gonzalez Manteiga and Presedo Quindimil [35] in the 
parametric case to the single index model. Compared with these papers our 
test achieves local power known from parametric tests, though the nonpara- 
metric components can only be estimated at a worse rate. Mathematically, 
we have to pay a price for this. For example, Theorem 2.1 cannot be ob- 
tained by just applying Taylor's expansion and [/-statistic theory. Rather, 
our proofs require some new techniques involving (local and global) prop- 
erties of the rank-transformed projected values p^Xi,l <i<n. Unfortu- 
nately, techniques also developed in [31] to analyze the (rank-transformed) 
nearest-neighbor regression function estimator at a point are of no help 
here. 

3. Simulation study and applications. 

3.1. A simulation study. In our simulations we studied two models. The 
first is with continuous response, namely, 



\ 1=1 / 

where X and e are independent, xi are the components of X and the distri- 
butions of X and e are N{0,lp) and A^(0, 1), respectively. The hypothetical 
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model is ^{P'^X) = and s{X) = X;f=i \xi\. Therefore, the null model 

holds if and only if c = 0. 

The second model is with binary response, 



where K = 0, 1 is a binary variable for which Y = 1 with probability ^{0^^s^ + 
cs(x)) for any given X = x. Also, here c = corresponds to the hypothetical 
model, that is, the logit model. It is heteroscedastic, and X and e are not 
independent. Again, X Af{0, Ip). We used c= 1,2,3 to investigate the 
power of the test. 

Two weight functions were considered in the simulation, H^i(x) = s(x) 
and W2ix)=J2^^^xl Based on our findings in Section 2, Wi is optimal for 
model (3.1) as e is independent of X, and W2 is a natural candidate for an 
even function. For model (3.2), we also use these two weight functions due 
to the following observation: When c is small, $(— /3'^x + cs(x)) is close to 
$(-/3^x) + c$'(/3^ x)s(x), where 4*'(-) is the derivative of '&(•). Therefore, 
s(x) is also a good choice of a weight function in this case. 

In order to implement the omnibus test based on Tn = sup^ |r„(7)| of The- 
orem 2.3, we have to use a resampling approximation to determine critical 
values. The wild bootstrap is clearly an option. In view of (2.6), however, we 
suggest the following algorithm: for any 7, T„(7) is asymptotically equal to 
II + n~^/^ Zir=i ^ii^i ~ W(Ui)]. Under Hq, /i = 0. For any i.i.d. random vari- 
ables Ci, i = 1, . . . ,n, independent of the (xj, yj)'s with mean and variance 1, 
it is easy to prove that, for almost all sequences {{xi,yi), . . . , (x„, y„), . . .}, 

the process T^(7) = n~^/'^J2i=i^i^i[^i ~ Wn\Ui)] has the same limit as 
Tnij). It is worthwhile noting that, using this resampling scheme, we do 
not need to estimate the variance. In a different setup, this algorithm has 
been used by Zhu [40] and Zhu and Ng [42]. The proof and the proce- 
dure are similar. We omit the details. To implement the test, we can gener- 
ate, by Monte Carlo, m sets of {ei, . . . , e^} and then compute m values of 

= sup^ |T^(7)|. The [(1 — a)?TT.]th value can be used as the critical value, 
where a is the significance level and [a] stands for the integer part of a. In 
the following simulation, we used standard normal random variables Cj. 

Another concern is bandwidth selection. As we noticed in Remark 2.5, 
h n^^/^. In other words, compared with nonparametric estimation of re- 
gression, in the context of model checking, undersmoothing is needed. So 
existing bandwidth selection methods cannot be recommended in the set- 
ting of this paper and, indeed, may lead to a considerable bias. Therefore, 
we adopt a semidata driven selection procedure. The steps are as follows: 



Y = 



exp(-/3^X + c(ELi|xd)) 



+ e 



(3.2) 



l + exp(-/3^X + c(Ef=i|xd)) 
■.<^>{P'^X + cs{X))+e, 
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Table 1 
Size of the tests T„ and Tn'^ 



Model (3.1) Model (3.2) 







n = 50 


n = 100 








n = 50 


n = 100 


Wi 


p = 2 


0.048(0.046) 


0.045(0.047) 


Wi 


p = 


2 


0.060(0.056) 


0.057(0.054) 


Wi 


p = 3 


0.053(0.053) 


0.047(0.052) 


Wi 


p = 


3 


0.054(0.052) 


0.052(0.054) 


W2 


p=2 


0.048(0.047) 


0.052(0.053) 


W2 


p = 


2 


0.055(0.054) 


0.052(0.054) 


W2 


p = 3 


0.047(0.053) 


0.046(0.051) 


W2 


p = 


3 


0.055(0.055) 


0.050(0.054) 


f„ 


p=2 


0.048(0.051) 


0.053(0.052) 


Tn 


p = 


2 


0.058(0.056) 


0.057(0.051) 


f„ 


p = 3 


0.045(0.048) 


0.052(0.054) 


Tn 


p = 


3 


0.061(0.053) 


0.054(0.049) 



^ The values in parentheses are the estimated sizes when the bandwidth is selected by a 
grid search. 



1. Select hi by minimizing the mean integrated squared error, subject to 
weight function W{-), 

n 

(3.3) MISEih) = J2iY,-^i.i^\u,)fwiXJf, 

j=i 

which is analogous to the criterion used by Hardle, Hall and Ichimura 
[16]. The kernel K is 15/16(1 - ^2)2/(1^1 < 1); see [17]. 

2. Our final choice for h is h = hi x 7^-1/3+1/5^ 

The rationale of this algorithm is that, under our conditions and the choice 
of the kernel function, the rate of hi is n~^^^. Therefore, h is of the order 
n~^^^ and, hence, ensures convergence of the test statistic. For validation 
purposes we also considered a grid point search and chose h so that the 
empirical level was closest to the nominal level. 

Finally, we need to estimate the parameter p. There are at least three 
methods in the literature; see [16, 20, 25]. In our simulation study we applied 
Li and Duan's least squares estimator for ease of implementation. 

We considered the case with p = 2, 3 and /3 = (1, -l)^/\/2, P = (1, -1, 
respectively. The sample sizes were n = 50, 100. The significance level was 
a = 0.05. The test statistics were computed for 1000 replications. 

Table 1 presents the attained levels for the various scenarios. 

It becomes apparent that the significance level is well attained in most 
cases, although, for model (3.2), the size of the tests for n = 50 is slightly 
larger than 0.05. Furthermore, the size of the tests with the bandwidth 
selected by the above algorithm is similar to that obtained from the grid 
point search. This shows that our data-driven approach works well. We will 
therefore use this algorithm also to select the bandwidth in the following 
simulation and the applications to two real data examples. 
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To demonstrate power through simulations, we considered models (3.1) 
and (3.2) with c = 1,2,3. 

For model (3.1), as expected, the test T„ based on the optimal Wi out- 
performs the others. In model (3.2), when we have dependent errors and T„ 
is no longer optimal, all three tests have a similar behavior. 

To compare the performance of our method with other existing tests 
through a simulation study, we considered two scenarios. The first aim was 
to test the single index model versus the existence of interaction effects. 
Particularly, we considered 

(3.4) m(x) = (/3'^x)^ + ci|xiX2| + C2IX1X3I + C3\x2Xs\. 

For nonvanishing c's, this model allows for interaction terms. The compari- 
son is among our maximin test, the omnibus test T„, Fan and Li [14] (FL- 
test) and Ait-Sahalia, Bickel and Stoker [1] (ABS-test). In the simulation, 
similar to the previous case, we took p = {1,-1,1)'^ /Vs. The sample size 
was n = 50, while the significance level was 0.05. The constants were taken 
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to be equal: ci = C2 = C3 = c with c = 0, 1.0, 2.0, 3.0. c / corresponds to the 
alternative. In Figure 3 the estimated power was computed from 1000 repli- 
cations. Recall that FL- and ABS-tests require selection of two bandwidths. 
Since the significance levels of their tests heavily depend on the choice of the 
bandwidths and there is no data driven selection, a fair comparison causes 
some problems. In a simulation study, however, one may determine (through 
replications) the bandwidth on a grid in such a way that the nominal level 
is best attained. In this way we are able to produce tests which attain the 
right level for the null model. 

We also ran many simulations with other bandwidths. It turned out that 
the FL-test and the ABS-test are nonrobust in h so that the nominal level 
may not be attained after a slight change in h. 

As expected, with optimal weight Wi has larger power than the test 
with weight function W2- has a power similar to T„ with W2- The FL- 
and ABS-tests are clearly outperformed but behave similarly otherwise in 




Fig. 3. The estimated power for model (3.4): The dashdot line is for the maximin test 
with the weight function Wi, the solid line with the weight function W2; the dotted line is 
for the ABS-test, the dashed line for the FL-test, and the dashed line plus star * for T„. 



the situation considered by us. Similar to the case with model (3.1), the 
FL-test has larger power than the ABS-test. 

We also compared the performance of all tests for a model studied by Xia, 
Li, Tong and Zhang [38] in their Example 1, where, in our notation, p = 2 
and 

m{x) =xi+X2 -I- 4exp{-(a;i -1- X2)^} + c(xf + xl)^^'^, 

and the errors e are independent of X with e ~ AA(0,cjg). 

In Table 2 we report on the power results of T„ with Wi{-) and W2{-), 
Tn, ABS- and FL-tests and the XLTZ-test. The bootstrap approximation of 
the XLTZ-test is similar to that of Theorem 2.3. For Tn, we again used the 
weights Wi{x) = l^il -|- |x2| and W2{x) =x\ + x^. The significance level was 
0.05. The test statistics were computed for 1000 replications. Note that these 
two weights are not optimal for this model. We do not report the results with 
the optimal weights because the previous simulations have provided evidence 
of its good performance and, from Table 2, we can see that the suboptimal 
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Table 2 





Estimated 


power of 


six tests with 


n — 50,p 


= 2^" 




C 




0.30 






0.50 







0.25 


0.50 





0.25 


0.50 


T„{Wi) 


0.044 


0.122 


0.508 


0.052 


0.106 


0.452 


T„{W2) 


0.060 


0.092 


0.408 


0.062 


0.090 


0.300 


XLTZ-test 


0.063 


0.099 


0.376 


0.043 


0.043 


0.163 




0.063 


0.090 


0.350 


0.043 


0.073 


0.253 


ABS-test 


0.050 


0.060 


0.140 


0.050 


0.055 


0.085 


FL-test 


0.042 


0.052 


0.090 


0.050 


0.046 


0.065 



^ Tn{Wi), i — 1,2, stand for the tests T„ with Wi and W2, respectively. 



weights Wi and W2 already work well. Again, for ABS and FL, bandwidths 
were chosen so as to yield the nominal level under Hq as closely as possible. 

In Table 2, the values for the XLTZ-test are from Table 1 of [38]. We see 
that Tn with Wi is best. Second, between T„ and the XLTZ-test, when the 
variance a'^ of the errors is small, the XLTZ-test is slightly better, while 
when o"^ gets large, Tn outperforms the XLTZ-test. Third, comparing T„ 
with Tn with W2, we see that Tn performs slightly worse. For this model, 
we find that the ABS- and FL-tests do not work well. 

3.2. Applications. In this section we apply our test to two data sets. 

Example 3.1. The data set is the bull data; see [24]. The data are the 
measured characteristics of 76 young bulls sold at an auction. It is interesting 
to study the relationship between the selling prices and the characteristics 
of the bulls: yearling height at shoulder; fat-free body (pounds); percentage 
of fat-free body; scale from 1 (small) to 8 (large); back fat (inches); sale 
height at shoulder (inches) and scale weight (pounds). The response Y is the 
standardized selling price and the other standardized measurements are the 
covariates X = {xi, . . . jXj). Figure 4(a) provides a plot of P'^X against the 
response Y . This linear fitting was also used in [24]. There is some indication 
of a relationship between the residuals ej and P'^Xj, see Figure 4(b). We 
tested the linearity of the model using the Stute, Gonzalez Manteiga and 
Presedo Quindimil [35] test. The p-value was 0.044. Therefore, the linear 
model needs to be rejected at level a = 0.05. 

Next consider single-index fitting. Again /3 was estimated as in [25]. To 
justify their estimation method, we first tested the elliptical symmetry of 
the distribution of X. The nonparametric Monte Carlo test proposed by 
Zhu and Neuhaus [41] was employed. The p-value was 0.83. The statistic r„ 
was computed for the weight function VF(x) = X]j=i^j- The kernel function 
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K{-) is the same as for (3.3), and the bandwidth is h = 0.35. The p- value 
was 0.310. Therefore, a single-index model need not be rejected. 

Example 3.2. The data are the automobile collision data as analyzed 
by Hardle, Hall and Ichimura [16]. The sample size is n = 58. We also tested 
the elliptical symmetry of the distribution of the X-data, using the nonpara- 
metric Monte Carlo test of Zhu and Neuhaus [41]. The p- value was 0.25. 
This justifies the use of the Li~Duan method for estimating the projection 
direction P for this data set. For a single-index fitting, the kernel function 
K{-) was again the same as for (3.3), the bandwidth was h = 0.4, while the 
weight function was T^(x) = X]j=i The test statistic T„ was used and the 
asymptotic p-value was 0.32. The single-index model is therefore tenable. 

4. Proofs. To prove Theorem 2.1, we expand our test statistic Tn as 

n n 

n^l^fn = Y.e,W, = ^[y, - V'i^-)(F„(/3^X,))]T^, 

n 

(4.1) = Y.[Y, -Yf- i;i^\F4P^Xj)) + V^S(i^n(/3^^,))]W0- 

n 

+ E - ^ni {Fn0'^X^))]W, ^I + II, 

where Yj^ is computed under the null model s = 0, and Tp^^Q is computed 
as 'ipn\ with the same (3 but with Y-'. The second sum will be further 




-2-10123 -2-10123 
x*b (a) x*b {b} 

Fig. 4. (a) Fit to the bulls data: the projected data Xj versus the linear fit (solid line) 
and the response data (dots); (b) the projected data versus the residuals. 
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decomposed. For this, put 



(n-l)/^^ * V h 

This function is based on the true [3 and F and is therefore unknown in 
practice. It will, however, play an important role in proofs, since it is close 
to ■0^0 and, on the other hand, is computed from independent observations. 
Write 

n 

n 

+ Y.'^i'nlinfx,)) - ^i:^l{FnCfx,))]w, ^ III + IV. 



Observe that 



1 .JU,-U, 



with 



Uj=F{p^Xj), j = l, 



being independent and uniformly distributed on [0,1]. Hence, /// is a [/- 
statistic of degree two. Summarizing, we have 

n 

(4.2) ^e^VFj- = / + I/I + /F. 

i=i 

After standardization, term I will be shown to tend to a limit which de- 
pends on the shift s and, hence, will determine the local power of the test. 
As already mentioned, /// is a [/-statistic of degree two, with a kernel de- 
pending on h, and hence on n. The term IV is more complicated, since 
the kernel contains empirical quantities. After all, it will turn out that /// 
and IV admit i.i.d. representations which will partly cancel out and jointly 
determine the (limit) distribution of T„ under Hq. To carry out this pro- 
gram, note that both ^/^^^ and ij^^^Q are evaluated at Fn{P'^Xj). Hence, the 
mathematical analysis of our test statistic requires a careful study of the 
terms 

. ^ fFJP'^Xi)-FJ(3'^Xi)\ 

(4.3) k{^^ ^' "^^ -), l<i^3<n. 
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For this, denote by F^^ the empirical distribution function of 6^ Xi , . . . , 
Hence, F„ = F^^ if 9 = p. Since K has compact support, say [—1, 1], indices 
i,j only contribute to (4.3) if 

(4.4) \F^{d''X,)-F^{e^Xi)\<h, d = p. 
Since by assumption B(iv) 

ni/2(/3-/5) = Op(l), 

for each given e > 0, we may find a large constant C such that 

F{n'^/'^\\P- l3\\>C)<e for all n > 1. 

In other words, up to a small event, /3 is contained in the Cn~^/^-neighborhood 
of p. The first goal will be to analyze the effect of replacing Fn{P'^Xj) and 
FniP'^Xi) in (4.3) with Uj = F{p'^Xj) and U^ = F{0^Xi), respectively, sub- 
ject to (4.4). Introduce F^ , the distribution function of 9^ X. Hence, F = F^ 
for 9 = (3. 

In our first lemma we derive a maximal bound for F^ — F^ evaluated at 
9'^Xj and (3'^Xj. Recall that, by assumption B(i), E||X||''' < cxo. This implies 
that 

max llXjll = Opfn") fora = 7~"^. 

l<i<n 

For this reason, it will suffice to analyze all leading and error terms on the 
set where 

(4.5) max ||^i|| < Cin" for some large finite Ci. 

l<i<n 

Denote by the set of all p- vectors. 

Lemma 4.1. Put, for each 9 £@ and l< j <n, 
a]:=F\9'^X,)-FP{(3'^X,). 
We then have, on the set (4.5), 

max max \a^A = 0^{n~^^'^^°'). 

||6»-/3||<Cn-i/2 l<i<n' ' 

Proof. We shall first deal with an upper bound for the «j's. Fix a 
possible value Xj of Xj . Then 

^ F\9'^^j) - Ff^iP^Xj) = ¥{9^X < 9^Xj) - P(/3^X < /5^Xj) 

= ¥{9^X < 9^Xj,p^X < /?^xj) + ¥{9'^X < 9'^Xj,0^X > /3^Xj) 

- ¥{p^X < /3^xj) < F{9'^X < 0^Xj,/3^X > /3^Xj). 
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Now, e'^X < implies 

fx = 9'^x + {(3- efx < e^xj + (/? - efx 

= + {(3- 9f{X - xj) < (3^xj + Cn~^/'^{\\X\\ + ||xj ||}. 
Under (4.5) we therefore obtain, for each I < j <n, 

a] < P(/3^Xj- < (fx < P^Xj + 2CCin-i/2+a) ^ F{\\X\\ > Cin''). 

Since, by B(ii), has a bounded density, the first probability is 0{n~^^'^~^°'). 
As to the second probability, apply Markov's inequality to get 

Fll XW^ 

P(||X||>Cin")<^^. 

This completes the proof. For the lower bound, just reverse the roles of 9 and 
p. Now one needs the fact that the densities of 9^X are uniformly bounded 
for all ^ in a small neighborhood of (3. □ 

In the following lemma we investigate the local oscillations of the empirical 
process 

{x,9)^F^{x)-F\x) 
in a neighborhood of f3. For this, introduce 

Giix, y) := F^{x) - F%x) - F^{y) + F^iy) 
for 9 & Q and x, y G M satisfying 

(i) ||0-/3||<Cn"i/2, 

(ii) <Cin-i/2+a_ 

Lemma 4.2. Under the assumptions of Theorem 2.1, we have 
sup|G^(x,y)| = Op(Vn-3/2+ainn), 

where the supremum extends over all x,y and 9 satisfying (i) and (ii). 

Proof. The proof is a modification of the proof of Theorem 37 in [26], 
page 34. First note that the halfspaces form a class with a polynomial cov- 
ering number. The measure of each set involved in the above supremum, 
F^{x) — F^{y), is bounded from above in absolute value by 

\Fi9'^X <x)-F{(3^X <y)\ 

< \F{9'^X <x)- P(/3'^X < x)| 

+ \F{P^X <x)- F{P^X <y)\< Csn-^/^+a^ 
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by (i), (ii) and assumption B. For the first difference apply a teclinique 
already used in the proof of the previous lemma. If we replace the small e 
in Pollard's [26] Theorem 37 by a large K >Q and set 

2 Inn 

therein, we obtain the required improbability bound 0((5^a„), rather than 
a convergence rate to zero. Here (5^ equals the maximal measure of the 
included sets. Since 5^ = 0{n~^/'^~^°')^ the result follows. □ 

In the next lemma, we expand n~^^'^III into a sum of independent random 
variables plus a negligible error. The leading term will contribute to the limit 
of our test statistic when the null hypothesis is true. Recall 

W{u) = ¥.[Wi\Ui=u]. 

Lemma 4.3. Under the assumptions of Theorem 2.1, we have in proba- 
bility as n — > oo , 

n n 

n 

= n-'/'J2{',[W,-W{U,)] 
i=i 

+ Op(l). 

Proof. Sm is a ?7-statistic of degree two with a kernel depending on h 
and therefore on n. The Hajek projection of Y^'WjK{ ' ) equals 

£ Wiv)K {^-^^ dv + Wj ij{u)K (^^^) du 



1 /•! 



W{v)i){u)K(^^-^^ dvdu. 



10 Jo 

Conclude that the Hajek projection of Sm equals 

'v-U, 



3=1 i=l 

~[ Jo \ h 



h 



dv 



+ J^J^ W{v)ij{u)K(^^^-j^^ dv du. 
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Furthermore (see [27]), 

nSn,-Sn,f = 0{ 

whence 

Sn,-Sn,=Or{{nh)-'/^)=Op{l). 

Hence, it suffices to further expand Sm ■ For this, put 



1 

nh 



1 /-i 



Eh 



Jo 



Wiu)iljiv)K 



V — u 



and consider 



Rni = n 



i=i L •'^ 



h 

v-Uj 
h 



dvdu 



n 



n 



i=i 



Wj / ^{u)K 



Uj-u 



dv — Eh 
du — Efi 



i=l 
n 



n 



It may be written as a single sum of centered i.i.d. random variables. Its 
variance is bounded from above by the second moment of 



h- 



+ Wi 



W{v)K 



v-Ui 



h 



dv-W{Ui) 



JUi-u 



h 



du-il){Ui) 



which is easily seen to go to zero as /i — > 0. Conclude that = op(l) and, 
therefore, 

Sni = Sni +Op(l) 



n 



n 



-1/2 



n 



-1/2 



n 



n 



-1/2 



Y^[Y^'W{Uj)-E{Y,'W{Ui))] 
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+ n^/^[^{Wii^{Ui)) - h~^Eh] + op(l). 

To complete the proof of the lemma, it suffices to show, in view of assump- 
tion C(i), that the last bracket is 0{h?'). But 



W{v) 



W{v) 



v/h 







V — u 

h 



du 



dv 
dv. 



4'{v — sh)K{s) ds 

l(v-l)/h J 

For h <v < 1 — h, the inner integral extends over the whole support of K, 
namely, [—1, 1]. Using the facts that K is symmetric at zero, j\ K{s) ds = 1 
and is twice continuously differentiable, Taylor's expansion yields that the 
difference is uniformly in/i<f<l — /lof the order 0{h?). For <v < h 
(and similarly for 1 — h < v < 1), the difference is 0(h). Since, however, 
<v < h has Lebesgue measure h, we also obtain the upper bound /i^ for 
this part of the integral. □ 

The quantity introduced and studied below will be the leading term 
for n~^^'^I with / from the expansion (4.2). 

Lemma 4.4. Under the assumptions of Theorem 2.1, we have in proba- 
bility as n — > oo, 



5, 



"2 



n 



1 



n{n — l)h 



j=i 



h 



^ E{[s{X) - E{s{X)\U)]W{X)} = fi. 
Proof. is a [/-statistic of degree two. Recall s{u) = K{s{X)\U = u). 



The Hajek projection of s{Xi)WjK{^^^-j^) equals 



siXi 



W{v)K 
1 rl 



v-U, 



h 



JO 



dv + Wj I s{u)K 
s{u)W{v)k( , 1 dvdu. 



du 



h 



Hence, the projection of Sn2 equals 



Sn2 — 



1 



^Y^s{X,)W,-—J2s{X,) W{v)K 



nh . 

3=1 1=1 
, n 1 

— y VF,- / s{u)K 
nh^ ^ Jo ^ ' 



v-Uj 
h 



dv 



Uj-u 



+ 



1 



1 rl 



Jo 



s{u)W{v)K 



V — u 



du 



dv du. 



28 W. STUTE AND L.-X. ZHU 

Furthermore, KjSnj — 5'n2}^ is of the order 0{n~^h~^) = o(l). 

Hence, it remains to show that tends to the desired hmit. Now similar 
to the proof of the previous lemma, it may be shown that 

n n 

j=i i=i 

+ n"^ V Wjs{Uj) - / s{u)W{u) du^O in probability. 
,=1 ^0 

The assertion of the lemma now is a straightforward consequence of the law 
of large numbers upon noticing that 

E[s{X)W{U)]= s{u)W{u)du. □ 



The next lemma will be helpful to find the final expansion and limit of /. 

Lemma 4.5. Under the assumptions of Theorem 2.1, we have 
1 



n{n — l)h 



n n 

X 



j=i 



^1 Fn{P'^Xj)-Fn{f3^Xi) ^^ _j^(Uj-Ui 



— > in probability as n — > oo. 
Proof. By Taylor's formula, 

FniP^Xj) - FnCp'^Xi) - Uj + Ui 



2 It It 



h 



where Ay is between the two i^-ratios in the definition of Sns- For each j 
(and similarly for i), 

\Fn0^Xj) -Uj\< \Fn0^Xj) - F^{p^Xj)\ + \a^\ 
< snp \F^{t)-F\t)\+ sup \a^A, 

where the suprema extend (with large probability) over the set of ^'s with 
11^ — P\\ ^ Cn~^/'^. Now it is well known that empirical measures approach 
the true measure at the rate Op(n~^/^) uniformly over the class of all 
halfspaces. See, for example, [26]. In other words, the first supremum is 
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Op(n~^/^). From Lemma 4.1, the second supremum is 
formly in 1 < j <n. Conclude that 

(4.6) sup \Fn0^Xj) -Uj\ = Op(n-V2+") ^ Q^^j^y 

Furthermore, since K has support [—1, 1], the summation in Sn^ takes place 
only w.r.t. those i,j for which at least one of the ratios falls into [—1, 1]. If 
this happens to be true for the first ratio, then by (4.6) also 

\U,-Ui\<C3h, 

with large probability for some appropriate C3. Summarizing, since K' is 
bounded, we get, with large probability, 

-1/2+Q 



{\u,-v,\<c-ih:)- 



The expectation of the right-hand side is, however, of the order 0(n ^I'^^'^h ^) 
0(1). This completes the proof of the lemma. □ 

We are now ready to analyze the term /. From its definition we have 

n 



n{n - l)h f^^jr^ V h 

In view of Lemmas 4.4 and 4.5 we therefore get the following result. 

Corollary 4.6. Under the assumptions of Theorem 2.1, we have 
n~^^'^I — > /i in probability. 

To summarize the results obtained so far. Lemma 4.3 yielded an i.i.d. 
representation of n~^/'^III , while Corollary 4.6 provided an in-probability 
limit for n~^/'^I. The analysis for n~^/'^IV is a bit tricky. At the end it will 
turn out that it admits an i.i.d. expansion which cancels with the second 
sum in Lemma 4.3. We may thus conclude that 

n 

fn = fi + ^ e,[W, - W{Ui)] + op(l), 

i=l 

which coincides with the i.i.d. representation (2.6) of Theorem 2.1. So it 
remains to show the following representation of n~^/'^IV . 



30 



W. STUTE AND L.-X. ZHU 



Lemma 4.7. Under the assumptions of Theorem 2.1, 



n 



'1/2JV 



n 



-1/2 



J2M(3'^X,)W{Uj) - EM(3''X,)W{U,)]} + op(l) 



Proof. By Taylor's expansion, 
1 



n 



V2(n- 



(4.7) 



n 



i=ij=i 

1 



K 



Uj - Uj 
h 



nV2(„-i)/i 

n n 

i=ij=i 



+ 



/l2 ' 



where Ajj is between the two X-ratios in the representation of IV . We shall 
show that the second double sum is negligible, while the first contributes to 
the i.i.d. representation of T„. First, we write 

Fn0^X,) = Fn{lFx,) - Ff'CfX,) - FiiP'^X,) + 

+ F^{P^X,) + Fi{P^X,)-F^{P^X,), 

and similarly for the index i. The first line equals, with 9 = (3 and x = 
p^Xj,y = P'^Xj, the quantity G^{x,y) appearing in Lemma 4.2. Conclude 
from that result that 



(4.8) 



nV2(n_i)/i2^^^^ 



K' 



h 



1 



n' 



,fUj-U^ 



K' 



h 



i=ij=i 

The double sum is easily seen to be bounded in probability. Since 

^-l/2+a In^^o, 
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this proves that (4.8) tends to zero in probabihty. Next we study 

I V V Y^wj<' ( Ei^Ili] 

ny^{n-l)h?{r[f^^^ ' \ h ) 

This sum is a 1^-statistic (see [27]), with a kernel depending on h and hence 
on n. It is asymptotically equal to a [/-statistic whose Hajek projection 
equals 

J^J^ ij{u)W{v)K' (^^^) [an{v) - an{u)] dudv. 

Here, a„ is the (uniform) empirical process pertaining to the Uj^s. Trans- 
formation of integrals, C-tightness of a;„, n > 1, and the fact that K' has 
compact support [—1,1] yield that the last double integral is equivalent to 

/ i^iv — wh)W{v)K' {w)[an{v) — an{v — wh)]dw dv. 
Jo J-i 

By continuity of ip, this is asymptotically equivalent to 
(4.9) 



h J Tp{v)W (v) K' {w)[an{v) — Oniv — 'wh)]dwdv 



Check that 



Here 



/ ip{v)W{v)K'{w)an{v - wh)dwdv. 
Jo J-i 



1 /-i ^ _ 

-J ^K' {w)aniv-wh)dw = ^/n[f n{v)-l]. 



1 " ^ /^;-[/, 



1=1 

is the kernel density estimator for the uniform sample Ui, . . . ,Un- 
Hence, (4.9) equals 

(4.10) -V^ f'i^{v)W{v)[fn{v)-l]dv. 

Jo 

Introducing the smoothed empirical distribution, 

dFn = fndv, 

and the pertaining empirical process On = \/n{Fn — Id), where Id denotes 
the identity function on (0,1), (4.10) becomes 

^l'{v)W {v)an{dv) . 
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'ipWdan= i;Wdan + op{l). 



It is known that 
(4.11) 

A simple proof of (4.11) may be obtained by using oscillation results for 
empirical processes; see [30]. We shall shortly see that all other terms will 
be negligible for the i.i.d. representation of n~^/'^IV, so that 

(4.12) n-^/^IV= / ipWdan + o^{l), 

Jo 

as desired. To justify (4.12), we next bound 
where 

lJ^=F\^^Xj), \<j<n. 
Hence, the JJj and C/j incorporate the theoretical distribution functions 

at 9 = P and 9 = (5, respectively. From Lemma 4.1, 



(4.14) 



max 

l<i<ri' 



This bound will sometimes be helpful to further simplify (4.13). First, be- 
cause K' is an odd function, (4.13) may be written as 



(4.15) - 



1 



h 



We shall only deal with the sum involving Y^Wj, the other being dealt 
with in a similar way. Now 



1 



1 



,( U,-Ui 

h 
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In the first two double series, first apply (4.14) to bound \Uj — Uj\ uni- 
formly in j. The expectation of, for example. 



1 



is easily seen to be bounded. Similarly for the second series. Conclude that 
each sum is 



Op(/i-^n-i/2+")=op(l). 

h 



As to the last j-sum, substitute w = ^\ apply Taylor's expansion to 



^{Uj — wh) and use the fact that 

K'{w)dw = d, / wK'{w)dw 
-1 J-i 

to finally get that the last sum equals 
1 " 



-=^vF,y([/,)(f/,-f/,)+op(i) 
1 " - 

Similar arguments yield for the double sum in (4.15) including the factors 
YjWi the representation 

1 " - 

-= W\U,)m,m - Uj) + op(l). 
Conclude that so far we have shown that (4.13) equals 

n 

(4.16) n-V2^(TyV')'(C/i)(f/, - t/,) + op(l). 

i=i 

At this point we see that another simple application of (4.14) even for 
bounded X's, that is, a = 0, does not yield an op(l) term. Therefore, we 
have to analyze Uj and Uj in a different way. As we shall see, finally, and in a 
disguised form, we take advantage of the fact that, for each 9 every projection 
of X is transformed into a uniform random variable F^{9'^X). Fix such 
a 6 and note that, for a random vector X with the same distribution as Xi 
but being independent of the sample (Xj, 1^), 1 <i <n, one gets 
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Here Tn = Yi, 1 <i <n) is the cr-field generated by the observations. 
Conclude that, for 9 = (5, 

~ = ^{^{eTX<eTXj} - ^{l3TX<l3TXj}\^ n} 
= ^{l{0Tx<0TXj} - ^{eTX<l3TXj}\^n} 

+ E{l|gTx</3TXj} - ^{f3Tx<i3TXj}\^n}, 

whence 

n 

n 

(4.17) = n-V2 - PfXjfiP^X,) + op(l) 

i=i 

(4.18) +]E|n-i/2^(H^V')'(^i)[l{eTx</3Tx,} -l{/3Tx<rx,}]|.^^ 

The process inside the conditional expectation is, after centering, asymp- 
totically C-tight. With 6 = P ^ P, we therefore obtain 

r rF^ilS'^X) _ > 

E{- • • iJ'n} = n^/^E / {WtPYiu) du\Fn + op(l) 

UFi^ieTx) J 

= n^l\(3 - m{{W^)'{U)Xf{p'^X)} + op(l), 

where the last equality follows from the mean value theorem, n^/'^{fi — f3) = 
Op(l) and the facts that j3 is measurable with respect to !Fn and X is 
independent of J^n- Inserting this in (4.17) and (4.18), we thus get 

n 

n-V2^(TyV)'(C/,)(^i-f/i) 

= n^/\p - /3)n-i j2{m)'{Uj)X^f{p'^Xj) - E[. • •]} + op(l). 

Since n^/^(/3 — /3) is stochastically bounded and the sample mean tends to 
zero according to the SLLN, this shows that (4.16) tends to zero in proba- 
bility. 

It remains to bound (4.7), but this is easy. In view of Lemma 4.2, upon 
applying by now standard arguments, we have 

1(4.7)1 =op(l). 

This completes the proof of Lemma 4.7. □ 
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